Nature Machine Intelligence
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match Nature Machine Intelligence's content profile, based on 61 papers previously published here. The average preprint has a 0.13% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Leonardsen, E. H.; Persson, K.; Grodem, E.; Dinsdale, N.; Schellhorn, T.; Roe, J. M.; Vidal-Pineiro, D.; Sorensen, O.; Kaufmann, T.; Marquand, A.; Selbaek, G.; Andreassen, O. A.; Wolfers, T.; Westlye, L. T.; Wang, Y.
Show abstract
Deep learning approaches for clinical predictions based on magnetic resonance imaging data have shown great promise as a translational technology for diagnosis and prognosis in neurological disorders, but its clinical impact has been limited. This is partially attributed to the opaqueness of deep learning models, causing insufficient understanding of what underlies their decisions. To overcome this, we trained convolutional neural networks on structural brain scans to differentiate dementia patients from healthy controls, and applied layerwise relevance propagation to procure individual-level explanations of the model predictions. Through extensive validations we demonstrate that deviations recognized by the model corroborate existing knowledge of structural brain aberrations in dementia. By employing the explainable dementia classifier in a longitudinal dataset of patients with mild cognitive impairment, we show that the spatially rich explanations complement the model prediction when forecasting transition to dementia and help characterize the biological manifestation of disease in the individual brain. Overall, our work exemplifies the clinical potential of explainable artificial intelligence in precision medicine.
Fisher, G. R.
Show abstract
Vision-language models (VLMs) pretrained on web-scale data have achieved remarkable performance across diverse tasks, leading to widespread adoption in industry. A natural question is whether these powerful representations transfer to specialized medical imaging domains, and whether domain-specific medical pretraining improves transfer. We tested these hypotheses using two VLMs on the NIH ChestX-ray14 benchmark: Qwen2.5-VL (pretrained on web data) and BiomedCLIP (pretrained on 15 million PubMed biomedical image-text pairs). Both models dramatically underperformed compared to convolutional neural networks (CNNs) with ImageNet pretraining. Across 5 random seeds, the best VLM achieved F1=0.196 {+/-} 0.004 versus a CNN baseline of F1=0.811. Domain-specific pretraining provided marginal improvement: BiomedCLIPs frozen encoder achieved F1=0.161 {+/-} 0.001 versus Qwens F1=0.124 (+30%), but this remains clinically inadequate. Fine-tuning both models led to catastrophic overfitting, with sensitivity collapsing from >65% to <36% as the models learned to predict "no disease" for all inputs. These results demonstrate that neither general-purpose nor medical-specific vision-language pretraining produces features suitable for dense multi-label medical imaging classification. For chest X-ray diagnosis, traditional CNNs with ImageNet pretraining remain substantially more effective than VLM-based approaches.
Otani, Y.; Koga, D.; Wakizaka, Y.; Shimizu, H.
Show abstract
Drug-resistant infections pose a global health challenge and necessitate the rapid development of novel antibiotics. Although high-speed and high-accuracy in silico drug discovery methods using AI have been established, only a few approaches that specifically target antibiotic development have been developed. This gap significantly limits our ability to rapidly discover effective antibacterials against emerging resistant pathogens. Here, we have developed BaCNet, an AI system that accurately predicts the binding affinity between bacterial proteins and compounds using only amino acid sequences and compound SMILES representations. Our approach integrates a protein language model with three complementary compound embedding methods, achieving high prediction accuracy and effectively maintaining performance when tested on previously unseen bacterial species. BaCNet successfully rediscovered known antibiotics and identified promising novel candidates, with molecular dynamics simulations confirming stable binding of top hits. Moreover, by integrating a compound generation and optimization system with BaCNet, we discovered novel compounds not present in existing databases with significantly enhanced predicted antibacterial activity. BaCNet represents a promising platform that could accelerate the identification of urgently needed treatments against resistant pathogens.
Shi, M.; Zheng, H.; Gottumukkala, R.; Jonathan, N.; Armstong, G. W.; Shen, L. Q.; Wang, M.
Show abstract
Early screening for glaucoma and diabetic retinopathy (DR) is critical to prevent irreversible vision loss, yet remains inaccessible to many underserved populations. However, AI models trained on hospital-grade fundus images often generalize poorly to low-cost images acquired with portable devices such as smartphones. We proposed CausalFund, a causality-inspired learning framework for training AI models that enable reliable low-resource screening from easily acquired non-clinical images. CausalFund disentangles disease-relevant retinal features from spurious image factors to achieve domain-generalizable screening across clinical and non-clinical settings. We integrated CausalFund with seven deep learning backbones for glaucoma and DR screening from portable-device fundus images, including lightweight architectures suitable for on-device deployment. Across diverse experimental settings and image quality conditions, CausalFund consistently improved AUC and achieved a more favorable sensitivity-specificity trade-off than conventional deep learning baselines. As a model-agnostic framework, CausalFund could be extended to other diseases and low-resourced scenarios characterized by degraded or non-standard imaging.
Fisher, G. R.
Show abstract
We achieved state-of-the-art performance on the NIH ChestX-ray14 multi-label classification task using a simple 3-model ensemble: mean ROC-AUC 0.940, F1 0.821 (95% CI: 0.799-0.845), PR-AUC 0.827, sensitivity 76.0%, and specificity 98.8% across 14 thoracic diseases. Our primary finding challenges current research priorities: pretraining diversity dominates architectural diversity. Systematic evaluation of 255 ensemble combinations from 8 models spanning three architecture families (ConvNeXt, Vision Transformers, EfficientNet) at multiple resolutions (224x224 to 384x384) revealed that a simple 3-model ConvNeXt ensemble combining ImageNet-1K, ImageNet-21K, and ImageNet-21K-384 pretrained variants outperformed all 252 alternative combinations, including modern Vision Transformers and efficiency-optimized architectures. This ensemble achieved mean ROC-AUC 0.940, exceeding recent hybrid transformer approaches (LongMaxViT [1]: 0.932) with substantially lower computational requirements. Systematic comparison of five optimization strategies (F1, F_SS, pure sensitivity, Youdens J, validation loss) established that clinical metric optimization outperforms traditional validation loss by 19.5% in F1 score. F_SS optimization (sensitivity-specificity harmonic mean) achieved optimal clinical balance: highest sensitivity (73.9%), best Youdens J (0.727), and superior threshold-independent performance (ROC-AUC, PR-AUC). Traditional validation loss optimization failed to align with diagnostic utility despite achieving mathematical convergence. Strategic pretraining selection and clinical metric optimization provide greater performance improvements than architectural innovation alone, enabling competitive state-of-the-art results on accessible computational resources (AWS g5.2xlarge, $1.21/hr).
Datta, S. K.; Shaikh, M. A.; Srihari, S. N.; Gao, M.
Show abstract
In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network to achieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important features and suppress the noise-inducing features. We compare the performance of VGG, ResNet, Inception ResNet v2 and DenseNet architectures with and without the Soft-Attention mechanism, while classifying skin lesions. The original network when coupled with Soft-Attention outperforms the baseline[16] by 4.7% while achieving a precision of 93.7% on HAM10000 dataset [25]. Additionally, Soft-Attention coupling improves the sensitivity score by 3.8% compared to baseline[31] and achieves 91.6% on ISIC-2017 dataset [2]. The code is publicly available at github1.
Mascart, C.; Tran, K.; Samoilova, K.; Storan, L. T.; Liu, T.; Koulakov, A.
Show abstract
Recent advances in deep learning have enabled prediction of odorant perception from molecular structure, opening new avenues for odor classification. However, most existing models are limited to predicting percepts from fixed vocabularies and fail to capture the full richness of olfactory experience. Progress is further limited by the scarcity of large-scale olfactory datasets and the lack of standardized metrics for evaluating free-form natural-language odor descriptions. To address these challenges, we introduce Odor Description and Inference Evaluation Understudy (ODIEU), a benchmark which includes perceptual descriptions of over 10,000 molecules paired with a model-based metric for evaluating free-form odor text descriptions. The model-based metric uses Sentence-BERT (SBERT) models which are fine-tuned on olfactory descriptions to allow better evaluation of human-generated odor texts. Using the fine-tuned SBERT models, we show that free-form text odor descriptions contain additional perceptual information in their syntactic structure compared to semantic labels. We further introduce CIRANO (Chemical Information Recognition and Annotation Network for Odors), a transformer-based model that generates free-form odor descriptions directly from molecular structure, thus implementing the molecular structure-to-text (S2T) prediction. CIRANO achieves performance comparable to humans. Finally, we generate human-like descriptions from mouse olfactory bulb neural data using an invertible SBERT model, yielding neural-to-text (N2T) predictions highly aligned with human descriptions. Together, CIRANO and ODIEU establish a standardized framework for generating natural language olfactory descriptions and evaluating their alignment with human perception. Code is available at https://github.com/KoulakovLab/ODIEU
Sun, Y.; Chen, K.; Liu, K.; Ye, Q.
Show abstract
Self-supervised learning on 3D molecular structures is gaining importance in data-driven scientific research and applications due to the high costs of annotating bio-chemical data. However, the strategic selection of semantic units for modeling 3D molecular structures remains underexplored, despite its crucial role in effective pre-training--a concept well-established in language processing and computer vision. We introduce Localized Geometric Generation (LEGO), a novel approach that treats tetrahedrons within 3D molecular structures as fundamental building blocks, leveraging their geometric simplicity and widespread presence across chemical functional patterns. Inspired by masked modeling, LEGO perturbs tetrahedral local structures and learns to reconstruct them in a self-supervised manner. Experimental results demonstrate LEGO consistently enhances molecular representations across biochemistry and quantum property prediction benchmarks. Additionally, the tetrahedral modeling and pretraining generalize from small molecules to larger molecular systems, validating by protein-ligand affinity prediction. Our results highlight the potential of selecting semantic units to build more expressive and interpretable neural networks for scientific AI applications.
Shah, R.; Moradi, M.; Eslami, S.; Fujita, A.; Aziz, K.; Bineshfar, N.; Elze, T.; Eslami, M.; Kazeminasab, S.; Liebman, D.; Rasouli, S.; Vu, D.; Wang, M.; Yohannan, J.; Zebardast, N.
Show abstract
Glaucoma is a leading cause of irreversible blindness worldwide, with early intervention often being crucial. Research into the underpinnings of glaucoma often relies on electronic health records (EHRs) to identify patients with glaucoma and their subtypes. However, current methods for identifying glaucoma patients from EHRs are often inaccurate or infeasible at scale, relying on International Classification of Diseases (ICD) codes or manual chart reviews. To address this limitation, we introduce (1) OphthaBERT, a powerful general clinical ophthalmology language model trained on over 2 million diverse clinical notes, and (2) a fine-tuned variant of OphthaBERT that automatically extracts binary and subtype glaucoma diagnoses from clinical notes. The base OphthaBERT model is a robust encoder, outperforming state-of-the-art clinical encoders in masked token prediction on out-of-distribution ophthalmology clinical notes and binary glaucoma classification with limited data. We report significant binary classification performance improvements in low-data regimes (p < 0.001, Bonferroni corrected). OphthaBERT is also able to achieve superior classification performance for both binary and subtype diagnosis, outperforming even fine-tuned large decoder-only language models at a fraction of the computational cost. We demonstrate a 0.23-point increase in macro-F1 for subtype diagnosis over ICD codes and strong binary classification performance when externally validated at Wilmer Eye Institute. OphthaBERT provides an interpretable, equitable framework for general ophthalmology language modeling and automated glaucoma diagnosis.
Stoffl, L.; Bonnetto, A.; d'Ascoli, S.; Mathis, A.
Show abstract
Natural behavior is hierarchical. Yet, there is a paucity of benchmarks addressing this aspect. Recognizing the scarcity of large-scale hierarchical behavioral benchmarks, we create a novel synthetic basketball playing benchmark (Shot7M2). Beyond synthetic data, we extend BABEL into a hierarchical action segmentation benchmark (hBABEL). Then, we develop a masked autoencoder framework (hBehaveMAE) to elucidate the hierarchical nature of motion capture data in an unsupervised fashion. We find that hBehaveMAE learns interpretable latents on Shot7M2 and hBABEL, where lower encoder levels show a superior ability to represent fine-grained movements, while higher encoder levels capture complex actions and activities. Additionally, we evaluate hBehaveMAE on MABe22, a representation learning benchmark with short and long-term behavioral states. hBehaveMAE achieves state-of-the-art performance without domain-specific feature extraction. Together, these components synergistically contribute towards unveiling the hierarchical organization of natural behavior. Models and benchmarks are available at https://github.com/amathislab/BehaveMAE.
Hao, M.; Gong, J.; Zeng, X.; Liu, C.; Guo, Y.; Cheng, X.; Wang, T.; Ma, J.; Song, L.; Zhang, X.
Show abstract
Large-scale pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models in life science for deciphering the "languages" of cells and facilitating biomedical research is promising yet challenging. We developed a large-scale pretrained model scFoundation with 100M parameters for this purpose. scFoundation was trained on over 50 million human single-cell transcriptomics data, which contain high-throughput observations on the complex molecular features in all known types of cells. scFoundation is currently the largest model in terms of the size of trainable parameters, dimensionality of genes and the number of cells used in the pre-training. Experiments showed that scFoundation can serve as a foundation model for single-cell transcriptomics and achieve state-of-the-art performances in a diverse array of downstream tasks, such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, and single-cell perturbation prediction.
Chen, Y.; Bian, H.; Wei, L.; Jia, J.; Dong, X.; Li, Y.; Zhao, Y.; Wu, X.; Li, C.; Luo, E.; Xiao, C.; Hao, M.; Zhang, X.
Show abstract
Cells can be viewed as complex stories written by coordinated expression of genes. The success of AI large language models (LLMs) in mastering the human language inspired us to develop a large AI model scMulan with 368 million parameters to generate cell transcriptomics with designated attributes by learning the cell language. We defined a unified c-sentence to incorporate cell transcriptomics and meta-attributes, and pre-trained scMulan on the equivalence of 100 million human cells. Experiments showed that scMulan can generate designated pseudo transcriptomics, predict missing attributes of cells, reconstruct unobserved cells along functional gradients, and can help to identify driving regulators of cell fates. The generated data passed tests of current tools and can reflect the underlying biology.
Cai, T.; Abbu, K. A.; Liu, Y.; Xie, L.
Show abstract
Drug discovery has witnessed intensive exploration of the problem of drug-target physical interactions over two decades, however, a strong drug binding affinity to a single target often fails to translate into desired clinical outcomes. A critical knowledge gap needs to be filled for correlating drug-target interactions with phenotypic responses: predicting the receptor activities or function selectivity upon the ligand binding (i.e., agonist vs. antagonist) on a genome-scale and for novel chemicals. Two major obstacles compound the difficulty on this direction: known data of receptor activity is far too scarce to train a robust model in light of genome-scale applications, and real-world applications need to deploy a model on data from various shifted distributions. To address these challenges, we have developed an end-to-end deep learning framework, DeepREAL, for multi-scale modeling of genome-wide receptor activities of ligand binding. DeepREAL utilizes self-supervised learning on tens of millions of protein sequences and pre-trained binary interaction classification to solve the data distribution shift and data scarcity problems. Extensive benchmark studies that simulate real-world scenarios demonstrate that DeepREAL achieves state-of-the-art performance in out-of-distribution settings.
Kwee, B. P. Y.; Messemaker, M.; Marcus, E.; Oliveira, G.; Scheper, W.; Wu, C.; Teuwen, J.; Schumacher, T.
Show abstract
The prediction of peptide-MHC (pMHC) recognition by {beta} T-cell receptors (TCRs) remains a major biomedical challenge. Here, we develop STAPLER (Shared TCR And Peptide Language bidirectional Encoder Representations from transformers), a transformer language model that uses a joint TCR{beta}- peptide input to allow the learning of patterns within and between TCR{beta} and peptide sequences that encode recognition. First, we demonstrate how data leakage during negative data generation can confound performance estimates of neural network-based models in predicting TCR - pMHC specificity. We then demonstrate that, because of its pre-training and fine-tuning masked language modeling tasks, STAPLER outperforms both neural network-based and distance-based ML models in predicting the recognition of known antigens in an independent dataset, in particular for antigens for which little related data is available. Based on this ability to efficiently learn from limited labeled TCR- peptide data, STAPLER is well-suited to utilize growing TCR - pMHC datasets to achieve accurate prediction of TCR - pMHC specificity.
Wei, Z.; Li, M. L.
Show abstract
We introduce the History-Guided Deep Compartmental Model (HG-DCM), a novel framework for early-stage pandemic forecasting that synergizes deep learning with compartmental modeling to harness the strengths of both interpretability and predictive capacity. HG-DCM employs a Residual Convolutional Neural Network (RCNN) to learn temporal patterns from historical and current pandemic data while incorporating epidemiological and demographic metadata to infer interpretable parameters for a compartmental model to forecast future pandemic growth. Experimental results on early-stage COVID-19 and Monkeypox forecasting tasks demonstrate that HG-DCM outperforms both standard compartmental models (e.g., DELPHI) and standalone deep neural networks (e.g., GRU) in predictive accuracy and stability, particularly with limited data. By effectively integrating historical pandemic insights, HG-DCM offers a scalable approach for interpretable and accurate forecasting, laying the groundwork for future real-time pandemic modeling applications.
Kawakami, T.; Hosokawa, S.; Masamichi, I.; Kurozumi, A.; Tanaka, R.; Minatsuki, S.; ishida, J.; Isagawa, T.; Kodera, S.; Takeda, N.
Show abstract
Single-cell RNA sequencing (scRNA-seq) of patient samples holds promise for understanding disease mechanisms, but faces the challenge of excessive cost and effort in acquisition, processing, and data analysis, making it essential to leverage existing data. Pulmonary artery hypertension (PAH) is a refractory disease characterized by pulmonary vascular remodeling, and access to patient specimens is limited due to difficulties in tissue collection. In this study, we employed transfer learning with Geneformer, a deep learning algorithm pre-trained with scRNA-seq datasets and fine-tuned it with public PAH lung tissue data to identify the disease-relevant genes. The resulting algorithm, which we named PAH- former, demonstrated that its prediction accuracy varied significantly depending on the dataset used for fine-tuning. PAH-former enabled us to perform in silico perturbation analysis and identified PAH related genes. Loss-of-function PAH related genes in human pulmonary artery endothelial cells increased the expression of SOX18, a signature gene of PAH. This integration of artificial intelligence and biological experiments can significantly advance our understanding of molecular mechanisms of PAH.
Wang, W.; Qi, C.; Wei, Z.
Show abstract
Accurately modeling the binding between T-cell receptors (TCRs) and peptide-MHC (pMHC) complexes is essential for guiding immunotherapy development and personalized vaccine design. However, the vast diversity of TCR repertoires and the scarcity of experimentally validated interactions make generalization to unseen epitopes challenging. This paper proposes TIDE, a cross-attention-driven dual-encoder framework that leverages large protein and molecular language models to learn discriminative representations of TCRs and peptides. In TIDE, TCR sequences are encoded using Evolutionary Scale Modeling (ESM), while peptides are transformed into SMILES strings and processed by MolFormer to capture chemical and structural properties. Multi-layer cross-attention then refines and integrates these embeddings, highlighting interaction-relevant patterns without requiring explicit structural alignment. Evaluated on the TCHard benchmark under both zero-shot and few-shot settings, TIDE achieves superior predictive accuracy and robustness compared to state-of-the-art baselines such as ChemBERTa, TITAN, and NetTCR. These results demonstrate that combining pretrained language models with cross-attention fusion offers a powerful approach for TCR-pMHC binding prediction and paves the way for more reliable computational immunology applications.
Zhao, Y.; Zhao, B.; Zhang, F.; He, C.; Wu, W.; Lai, L.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThe rapid advancement of single-cell sequencing technology has significantly deepened our understanding of cellular heterogeneity, yet it concurrently presents substantial challenges for the unified modeling of single-cell data. Simultaneously, pre-trained foundation models have achieved notable success in domains such as natural language processing and image analysis. However, extending these models to accommodate ultra-long single-cell transcriptome sequences, characterized by an extensive number of genes, remains a formidable task. In this study, we introduce SC-MAMBA2, based on the MAMBA2 architecture, meticulously designed with a bidirectional modeling approach tailored for single-cell transcriptomics data. As the first single-cell foundation model to integrate state-space models (SSMs) underlying MAMBA2 architecture, SC-MAMBA2 features over 625 million parameters, covers more than 60,000 genes, and was pre-trained on a dataset of over 57 million cells, making it the most comprehensive solution for processing ultra-long transcriptome sequences. Extensive bench-marking across a diverse array of downstream tasks consistently demonstrates that SC-MAMBA2 surpasses state-of-the-art models, delivering superior accuracy and enhanced computational efficiency.
Wu, C.; Restrepo, D.; Nakayama, L. F.; Ribeiro, L. Z.; Shuai, Z.; Barboza, N. S.; Vieira Sousa, M. L.; Fitterman, R. D.; Alves Pereira, A. D.; Saito Regatieri, C. V.; Stuchi, J. A.; Malerbi, F. K.; Andrade, R. E.
Show abstract
This paper introduces mBRSET, the first publicly available retina dataset captured using handheld retinal cameras in real-life, high-burden scenarios, comprising 5,164 images from 1,291 patients of diverse backgrounds. This dataset addresses the lack of ophthalmological data in low- and middle-income countries (LMICs) by providing a cost-effective and accessible solution for ocular screening and management. Portable retinal cameras enable applications outside traditional hospital settings, such as community health screenings and telemedicine consultations, thereby democratizing healthcare. Extensive metadata that are typically unavailable in other datasets, including age, sex, diabetes duration, treatments, and comorbidities, are also recorded. To validate the utility of mBRSET, state-of-the-art deep models, including ConvNeXt V2, Dino V2, and SwinV2, were trained for benchmarking, achieving high accuracy in clinical tasks diagnosing diabetic retinopathy, and macular edema; and in fairness tasks predicting education and insurance status. The mBRSET dataset serves as a resource for developing AI algorithms and investigating real-world applications, enhancing ophthalmological care in resource-constrained environments.
Zhong, Y.; Yan, W.; Zhang, Y.; Tan, K.; Bian, B.
Show abstract
The mRNA serves as a crucial bridge between DNA and proteins. Compared to DNA, mRNA sequences are much more concise and information-dense, which makes mRNA an ideal language through which to explore various biological principles. In this study, we present NUWA, a large mRNA language foundation model leveraging a BERT-like architecture, trained with curriculum masked language modeling and supervised contrastive loss for unified mRNA sequence perception and generation. For pretraining, we utilized large-scale mRNA coding sequences comprising approximately 80 million sequences from 19,676 bacterial species, 33 million from 4,688 eukaryotic species, and 2.1 million from 702 archaeal species, and pre-trained three domain-specific models respectively. This enables NUWA to learn coding sequence patterns across the entire tree of life. The fine-tuned NUWA demonstrates strong performance across a variety of downstream tasks, excelling not only in RNA-related perception tasks but also exhibiting robust capability in cross-modal protein-related tasks. On the generation front, NUWA pioneers an entropy-guided strategy that enables BERT-like models in generating mRNA sequences, producing natural-like sequences that accurately recapitulate species-specific codon usage patterns. Moreover, NUWA can be effectively fine-tuned on small, task-specific datasets to generate functional mRNAs with desired properties, including sequences that do not exist in nature, and to design coding sequences for diverse proteins in biomanufacturing, vaccine development, and therapeutic applications. To our knowledge, NUWA represents the first mRNA language model for unified sequence perception and generation, providing a versatile and programmable platform for mRNA design.